PiQA: An Algebra for Querying Protein Data Sets

نویسندگان

  • Sandeep Tata
  • Jignesh M. Patel
چکیده

Life science researchers frequently need to query large protein data sets in a variety of different ways. Protein data sets have a rich structure that includes its primary structure, which is described as a sequence of amino acids, and its secondary structure, which is described as a sequence of folding patterns of the protein. Both these structures are important as the amino acid sequence is often used to find homologous proteins, and the secondary structure can produce important hints about the functionality of proteins. While there are tools for querying each of these structures independently, there are no tools for declarative querying on both these structures. Even the tools that allow querying on either one of these structures are not based on any formal algebra, and as a result require complex rewriting of the tools programming logic when the “query evaluation plan” changes. This paper introduces PiQA, a Protein Query Algebra, which provides a rich set of algebraic operations on both the primary and secondary structure of proteins. Using PiQA one can pose several interesting complex queries involving both the primary and the secondary structure of proteins. In addition, simple existing tools that query only on the primary structure, such as BLAST, can also be expressed in this algebra. PiQA is an important first step in developing an algebra that can form the basis of a declarative querying language for querying protein data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PIQA: pipeline for Illumina G1 genome analyzer data quality assessment

SUMMARY PIQA is a quality analysis pipeline designed to examine genomic reads produced by Next Generation Sequencing technology (Illumina G1 Genome Analyzer). A short statistical summary, as well as tile-by-tile and cycle-by-cycle graphical representation of clusters density, quality scores and nucleotide frequencies allow easy identification of various technical problems including defective ti...

متن کامل

Developing a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information

With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...

متن کامل

THE INTERNAL IDEAL LATTICE IN THE TOPOS OF M-SETS

We believe that the study of the notions of universal algebra modelled in an arbitarry topos rather than in the category of sets provides a deeper understanding of the real features of the algebraic notions. [2], [3], [4], [S], [6], [7], [13], [14] are some examples of this approach. The lattice Id(L) of ideals of a lattice L (in the category of sets) is an important ingredient of the categ...

متن کامل

Algebra for distributed data sources

The algebra for distributed data sources includes standard operations on sets which evolved from the relational and nested-relational algebras, the operations for querying the conceptual schemata, and the operations which allow for the manipulation of distributed objects. The algebra serves as the basis for the prototype implementation of the query optimizer for distributed data sources. The de...

متن کامل

Fuzzy universal algebras on $L$-sets

This paper attempts to generalize universal algebras on classical sets to $L$-sets when $L$ is a GL-quantale. Some basic notions of fuzzy universal algebra on an $L$-set are introduced, such as subalgebra, quotient algebra, homomorphism, congruence, and direct product etc. The properties of them are studied. $L$-valued power algebra is also introduced and it is shown there is an onto homomorphi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003